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Abstract 

Interest point detection is a common task in various computer vision applica- 
tions. Although a big variety of detector are developed so far computational 
efficiency of interest point based image analysis remains to be the problem. 
Current paper proposes a system-theoretic approach to interest point detec- 
tion. Starting from the analysis of interdependency between detector and 
descriptor it is shown that given a descriptor it is possible to introduce to 
notion of detector redundancy. Furthermore for each detector it is possible 
to construct its irredundant and equivalent modification. Modified detector 
possesses lower computational complexity and is preferable. It is also shown 
that several known approaches to reduce computational complexity of image 
registration can be generalized in terms of proposed theory. 

Keywords: interest point detection, image registration 
1. Introduction 

In many computer vision and multimedia retrieval applications images are 
represented as sets of distinctive regions called interest points or keypoints. 
In order to select such regions image is processed with detectors that usually 
apply specific local operators to image and select pixels of high response 
values. Due to their local nature keypoints possess attractive properties, 
such as stability under various image transforms. Compared to low-level 
global image features, for instance, color features, interest points are more 
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reliable. Detected points are characterized by descriptors, vectors that fulfill 
several conditions among which especially important ones are invariance to 
desired image transforms and distinctiveness. 

Detection-description image processing scheme was found quite effective 
on practice. It has been utilized in a broad range of applications including 
content-based image and video retrieval, image registration, stereo recon- 
struction, robotic navigation, medical imaging, object recognition, copyright 
infringement detection, computational photography and others. Probably 
the most successful approach to interest point detection and description pro- 
posed so far is Lowe's scale invariant feature transform (SIFT) [9]. Surveys 
of modern detectors and descriptors can be found in papers [11, 1]. 

However there remain challenges related to computational efficiency of 
concerned image processing methods and quality of their results. First, 
detectors produce large amount of keypoints, around 2 000 for usual im- 
ages [6]. It makes hard to implement scalable image processing systems, 
taking into account computational complexity of descriptor calculation and 
required storage capacity. For example, web-scale image retrieval systems 
have to handle collections containing billions of images; storing SIFT descrip- 
tors (comprised by 128 floating point values) for 1 billion images with 2 000 
keypoints in each would require over 1 000 terabytes of physical memory. De- 
veloping reliable retrieval in a large-scale collections is evidently a difficult 
problem too. Another example could be real-time tasks, as robot vision or 
interactive tomography. In these scenarios processing time is restricted and 
common interest point based methods are hardly applicable. But neverthe- 
less emergence of a variety of hardware implementations of SIFT proves the 
demand on such methods in real-time problem domain. 

Second, evaluation of interest points' quality is an arguable topic. The- 
ory guarantees that points found in a reference image will be redetected if 
an image would undergo specific transform (usually is it a similarity of affine 
transform of image geometry and monotonic intensity change ) and descrip- 
tors of corresponding points will be identical. However, when it comes to 
actual images, assumption of transform type is often violated. This is due to 
many reasons: three dimensional nature of scenes, occlusions, complex mo- 
tion, multiple and moving light sources, sensor distortions, noise, lossy com- 
pression, complicated editing effects and other. Therefore empirical studies 
are required to assess actual quality of image analysis. Several experimental 
methodologies could be found in literature [11, 13, 16, 17]. However most of 
them perform passive of a post factum evaluation: result of such experiments 
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are numerical scores that cannot be directly employed to improved method's 
quality. More recent works [15, 13, 16] suggest active evaluation that can be 
done during method execution in order to predict quality of analysis: results 
of active evaluation could be easily used to reject low quality points. 

In current research it is shown that although existing active evaluation 
approaches have considerable differences it is possible to develop a general- 
ized system-theoretic framework for quantitative evaluation of interest point 
detectors. Remaining sections are organized as follows. Section 2 provides a 
theoretical background on interest point detection and description. Proposed 
framework is described in Sect. 3. Section 4 describes practical applications 
of developed theory. Finally Sect. 5 concludes the paper and outlined direc- 
tions of further research. 

2. Interest Point Detection and Description 

In context of current paper image is defined as a nonnegative smooth 
bounded function of two variables: /(X), where X = (x,y) G X, bounded 
and connected set. Let us denote as {/} a set of all images depicting same 
physical object. Considering two images /(X) and /'(X) of above set, their 
respective points X and X' are called corresponding iff these points project 
the same point of physical object. It is evident that since corresponding 
points are known an approximate transform between images can be com- 
puted. These statement motivates usage of interest points for image matching 
and registration. In case when all points of images are equivalent establishing 
the correspondence requires exhaustive search that is prohibitive. Therefore 
it is necessary to introduce interest point selection technique. 

Let us define an interest point detector as an operation $ : {/} — > 2 X 
associating an image / with a set I G 1 fulfilling the following conditions: 

1. There exists a finite algorithm that implements operation $. 

2. For each image / set X = $(/) is finite. 

3. For each pair of image I\ and 7 2 sets X\ and £ 2 consist of corresponding 
points: VXi G £1 3 ! X 2 G £ 2 and VX 2 G X 2 3 ! X x G X x such that 
X 2 = F(X;l), where F(X) is a transform between Ii and 7 2 . 

Points X G X are called interest points. On practice interest points are 
extremum points of some differential operators on / function. 

Since sets of interest points £1 and £ 2 are computed for images I\ and 7 2 
the correspondences should be established. For each interest point X x G X\ 
we have to find point X 2 G X 2 such that X 2 = F(X;l). Unique existence 
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of point X 2 is guaranteed by definition of detector. It is sufficient to do 
an exhaustive search over a finite set £ 2 to select X 2 . Question of whether 
points (Xi,X2) correspond or not is answered on the basis of interest point 
descriptors [9, 14, 11]. 

Let us consider a metric space D and denote respective metric as po- 
Operation \I/ : {/} x X — > D is called an interest point descriptor if following 
conditions are satisfied: 

1. There exists a finite algorithm implementing operation 

2. There exists an en > such that for each images I\ and 7 2 and each pair 
of corresponding points X x G £1 and X 2 G £ 2 following relationship 
holds with necessity: 



and its violation precludes correspondence between points X x and X 2 . 
Value ^(1, X) G D is called a description of point X of an image /. Interest 
points such that relationship (1) holds are called corresponding in terms of 
descriptor Metric space D is called a description space. 

Theoretically well-founded way to implement interest descriptor is usage 
of truncated Taylor series (N-jet) or directional filter banks [8]. 

Formal definitions presented in current section are usually presumed in 
the scope of interest point based image analysis. However it is apparent that 
numerical implementations of detectors and descriptors violate strict formal 
conditions. The major cause of violation lies in discrete nature of images and 
computation. On practice sets of interest points are redundant. It means 
that correspondence can be established only between small subsets of interest 
points. In following section a theory of irredundant interest point detection 
is developed. 

3. System theoretic approach to image interest point detection 

Traditionally detection and description of interest points are considered 
as an isolated and independent stages of image processing. Therefore it is 
impossible to conclude about redundancy of points during detection stage. 
It is because the fact of redundancy can appear only after description and 
matching is performed. Since no knowledge about descriptor is available to 
detector redundancy cannot be evaluated. In current work it is proposed 
to utilize system-theoretic approach to unveil the interdependence between 
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detection and description. Using the knowledge about interdependency it is 
possible to construct irredundant detectors. 

Let us begin with introducing a definition of approximate interest point 
detector called a X-correct detector. Given a A>0 detector $ : {/} — > 2 X , 
is called a A-correct detector is following condition is satisfied: for each 
transform F(X) and each point X' G $ (/(X)) there exist an interest point 
X" G $(7(F(X))) such that inequality ||F(X) - X"|| < A holds. Interest 
points X G $(/(X)) are called A-correct interest points 1 . Value of A is called 
a correctness level. To denote a A-correct detector symbol $a is employed. 
Notion of A-correctness allows mathematically rigorous expression of inter- 
dependency between interest point detectors and descriptors. On the basis 
o system-theoretic approach it is possible to build qualitatively new interest 
point detection theory. 

Let us call interest point descriptor \l/ a continuous descriptor if V F and 
Ve>035>0 such that 

VX',X", ||F(X') -X"|| < 5 : p B (*(/,I'),*(/oF,I"))<e. (2) 

Descriptors build upon image function derivatives possess the above property 
by virtue of function I smoothness and presumed continuity of transform F. 

Resorting to definition of descriptor, consider e = €d- Then there exists 
5o > such that condition (2) holds. Consider now &s D ~ ~ (^-correct 
interest point detector. By definition of \I/ it follows that for all A-correct 
points detected with §s D corresponding points will be found. Hereinafter 
such detector will be referred to as <E>^. 

Consider images I", transform F between them and descriptor A 
set of \l/-irredundant correspondences for interest point detector <3>a is a set 

(/',/") = {(X',X") |X' G $(/'), X" G $(/"), ||F(X') -x"|| < M- 

Set consists of interest point pairs (X',X") that correspond in terms of 
descriptor 

On the basis of \I/-irredundancy it is possible to define equivalence re- 
lation between interest point detectors. Detectors and $a 2 are called 
^/-equivalent: $a 2 ~ ^a 2 ) if f° r an y images I" following equality holds 

S^ Xi (I',I")=Siz X2 (l',n (3) 



Interest point repeatability is a related concept used in literature [11, 1]. 



5 



^-equivalence of two detectors means that for any points that is not detected 
by both detectors there is no corresponding point in terms of descriptor \I/. 

Proposition 1. -equivalence relation is an equivalence relation. 

Proof. Reflectivity, symmetry and transitivity properties are inherited from 



Concept of ^-equivalence allows to consider detector equivalence classes. For 
any two detectors belonging to the same class interest point matching results 
obtained using descriptor \l/ will apparently coincide. 

Given Aj-correct detector $ Al and A 2 -correct detector <3>a 2 , $ Al is called 
embedded in $ Aa ($a 2 contains and denoted as <3>Ai C $ A2 , if Ai < A 2 
and for any image / G {/} following relationship holds: $Ai(-0 Q ^a 2 (-0j f° r 
each point X G &\ 2 (I) \ ^(-O and each point X' G $\ 2 (I o F) following 
inequality holds: ||F(X) - X|| > A x . 

Theorem 1. For any \-correct interest 'point detector $a and any continuous 
interest point descriptor ^ there exists $^ C $ A — b^-correct interest point 
detector such that $^ ~ <3>a- 



Proof. Resorting to definition of continuous descriptor, consider value of 5d- 
Two alternatives are available for detector $a 

1. A < 5d- in this case $a apparently is a ^-correct detector and there- 
fore is $a- ^ ~ $a because of reflectivity of equivalence relation. 



2. A > 5d- let us describe a way to build Consider a fixed continuous 
transform F and an image /. Then there exists at least one interest 
point X G $a(-0, such that for each X" G $ A (J o F) 



Let us denote as $a x interest point detector, such that $Ai(-0 = 

X'. This detector is evidently embedded in $a- It can be shown that 

c&a^^Ai- By definition of it is necessary to prove equivalency 



conditions only for images / and J o F. Let us introduce a quantity 



set equality relation. 



□ 



F(X') -X"|| > 5_ 
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called detection error of $a at point X' 2 . By definition, for point X' 
inequality ej-^X') > 5d holds. Then X' is a redundant interest point 
and cannot belong to any of pairs comprising a set JoF). Hence 

equivalence condition (3) follows. 
Two described above alternatives are available for detector $ Al too. Se- 
quentially carrying out similar analysis we construct a sequence of detectors 
{<&Ao> ^Ai; ^a 2 ) • • •}) where A = A. It can be shown that there exists an 
element of sequence such that Aj* < 5d- A set is finite by defi- 

nition, therefore above sequence also has a finite number of elements. Con- 
sider a singular case, when for each X' £ $a(-0 holds e/^X') > Sd- Then 
in process of sequence construction all points will be excluded one by one: 
Aj. = < 5d, corresponding to a trivial 0-correct detector. Otherwise there 
exists be at least one interest point X* £ such that e/^X*) < Sd- 

Then Aj* = c/ i f(X*) < Sd- In general, there can exists a set of points 
{Xi, . . . , X* }, such that c/ i f(X*) < Sd- In this case it follows that 

A,* = maxej,F(X*) < Sd- 

i=l, n 

Hereby a 5^-correct interest point detector $^ = exists and can be 
constructed with described procedure. At the same time $a,* is a last element 
of detector sequence under analysis. That is because first alternative holds for 
$a * • Statement $^ ~ $a appears as a result of transitivity of ^/-equivalence 

and proof and ^/-equivalence between any contiguous elements of detector 
sequence: <3>a, ~$A i+ i) i — 0,i* — 1. □ 

By means of A-correct detector theory it is possible to compare exist- 
ing detectors measuring their correctness level. However it is more impor- 
tant to evaluate redundancy of detector in the scope of given applications. 
Such evaluation can be carried out within developed framework on the basis 
of system-theoretic approach. We have already shown an interdependency 
between A-correct interest point detectors and descriptors. To evaluate re- 
dundancy it is necessary to dispose of 5d value (cf. definition of descriptor 
continuity). This value can be estimated given e D value, that defines interest 
point description distinction threshold. This value can be by-turn evaluated 
on the assumption of quality measures specific to an area of application. For 



2 Concept of localization accuracy [17, 10] is related to detection error. 



7 



example, transform approximation error can be a quality measure for image 
registration: given the value of admissible error value can be estimated 
experimentally. Systematic approach to interdependencies between stages of 
image processing allows therefore to evaluate redundancy of interest point 
detectors. 

4. Practical Applications 

Among possible practical applications of A-correct detector theory reduc- 
ing computational complexity of image matching is in paper's focus. 

Theorem 2. Consider a set of ^ -equivalent interest point detectors {$A fc }- 
Let among elements of this set exist $a* such that \* k > 8d and &\ tk such 
that A*fc < 5d- Then an ordering relationship -< can be established upon the 
set {$A fe }. -< $a 2 Ai < S D , 5 D < A 2 . And for all $ A , fe -< $a* hold 
the inequality 



where c($a) is a number of p^ value calculations required to establish corre- 
spondences between interest points detected by means of <&\. 

Proof. Inequality \* k > 5d means that among points detected with $a* there 
will inevitably be A^-correct interest points that are not Jo-correct. Let n* 
be a number of such points and n will be a total number of detected points. 
By definition of ^ each of above described n* is redundant. Therefore po 
value calculation for such points is redundant too. Establishing correspon- 
dences requires n{n + l)/2 calculations of metric value: 



In case when correspondences are to be established only between (^-correct 
points it follows that 



Existence of n* redundant points results in redundant metric value calcula- 
tions that have no effect on matching: 





(n — n*)(n — n* + 1) 



c(*a.)-c(*0 



n*(l + 2n-n*) 
2 



> 0. 



□ 
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Theorem 2 states that the concept of redundant complexity is defined 
upon ^-equivalent interest point detectors. Theorem 1 states that for each 
A-correct interest point detector $a it is possible to construct ^/-equivalent 
irredundant detector Property of ^/-equivalence guarantees that sets 
of corresponding interest points computed using descriptor \1/ together with 
detectors $a an d will coincide. Therefore transform approximations will 
coincide too. Furthermore, if sets of corresponding points are employed in 
the application scope to solve problems other than transform estimation, the 
solutions obtained with $a and will coincide. The choice between <3>a and 
<3><i, can thus be based on their computational complexity: by Theorem 2, <3>^ 
is advantageous. 

Consider now a question of computing given a <3>a- Theorem 1 proof is 
constructive, but it can't be directly employed in numerical methods, because 
in the course of proof detection error function e/ i p(X) plays a significant 
role. To compute values of this function we have to know transform F (cf. 
equation (4)): this requirement prohibits usage of detection error functions 
is numerical methods since transform is unknown. Building detector $^ 
requires means of indirect estimation of $a detection error. It should be 
noticed that in the course of Theorem 1 proof detection error function is 
used only to test the inequality e/ i p(X) > S j> Thus given some function 
e/(X) such that ej^X) > 5d -<=>- e/(X) > 5, where 5 is a constant, e/(X) 
can replace e/^X) without loss of proof validity. 

Indirect estimation of $a detection error can be obtained with different 
approaches. The straightforward way lies in averaging values ej t p(X) precal- 
culated for a sampled images, such that transform F is known in advance. 
Similar procedures were proposed in [11] for comparative analysis of several 
known detectors. However using this approach during image registration is 
ineffective since it requires a multitude of image transformations. 

An alternative lies in machine learning employment. Possibility of such 
approach is reasoned by the fact that testing inequality e/(X) > 5 can be 
seen as a binary classification problem [12]. Positive class corresponds to Sp- 
correct interest points. Within such framework interest point description are 
to be classified, and the descriptions can be calculated by means of descriptor 
^ that can differ from \1/ (descriptor used for matching). An example of such 
approach is a methodic described in [16]. 

Function e/(X) and 5 constant can also be defined explicitly. Article 
[3] proposes to employ Laplacian values to evaluate interest point quality: 
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e/(X) = A/(X). Value of 5 is estimated empirically. However there were is 
no knowledge about correlation between error function and Laplacian. 

Finally, visual attention models can be utilized to estimate detection er- 
ror. Several studies were carried out to evaluate repeatability of salient inter- 
est points [13, 2, 5]. Draper and Lionelle [2] compared NVT model (Neuro- 
morphic Vision Toolkit) [7] with SAFE (Selective Attention as a Front End) 
model and concluded that using SAFE allows to select interest points with 
average repeatability over 90% under similarity transforms. Model VOCUS 
(Visual Attention System for Object Detection and Goal-directed Search) [4] 
also build upon NVT was evaluated in articles [5, 13]. Results show that 
ratio of redundant interest points among salient ones is also lower than 10%. 
One disadvantage of using visual attention model is inherent restriction to 
process only natural images. 

To conclude, there are several successful approaches to implement image 
matching complexity reduction and these approaches can be generalized in 
terms of building irredundant interest point detector It should be noticed 
that with help of proposed theory it is possible to carry out evaluative studies 
to compare described approaches. 

5. Conclusion 

In this paper a novel interest point detection theory is proposed. By 
means of system-theoretic analysis of interdependency between interest point 
detector and descriptor developed an approach to introduce equivalency rela- 
tion between detectors that are used together with a fixed descriptor. Formal 
definition of interest point redundancy allows to prove existence of irredun- 
dant detector that can be constructed on the basis of any given detector. It 
is shown how a theory developed can be employed to reduce computational 
complexity of interest point matching. 

Current approach is centered around the notion of A-correct interest point 
detector that generalizes known concept of detector. Since concept of descrip- 
tor is left unchanged, further research will be directed to generalizing the 
notion of interest point descriptor and applying system-theoretic approach 
to the problem of image interest point based image analysis at whole. 
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